Analyses of swisscom data

Grid extras

Swisscom grid coordinates & IDs

Tile definitions were pulled from API using query_swisscom_heatmaps_api.py.

read_fun <- function(filename) {
  
  data <- jsonlite::fromJSON(filename)
  data <- jsonlite::flatten(data$tiles) %>% 
    dplyr::as_tibble()
  
  data$plz <- gsub("grid_|.json", "", filename)
  data$plz <- gsub("data/swisscom/", "", data$plz)
  
  return( data )
}

doFuture::registerDoFuture()
future::plan("multisession", workers = 8)

grid <- plyr::ldply(.data = fs::dir_ls("data/swisscom/", 
                                       regexp = "[0-9][.]json$"),
                    .fun = read_fun,
                    .id = NULL,
                    .parallel = TRUE) %>% 
  as_tibble() %>% 
  distinct()

Focusing on test area of Bern city centre, including postal codes:

x <character> 
# total N=6965 valid N=6965 mean=3044.06 sd=37.39

Value |    N | Raw % | Valid % | Cum. %
---------------------------------------
 3005 |  198 |  2.84 |    2.84 |   2.84
 3006 |  604 |  8.67 |    8.67 |  11.51
 3007 |  254 |  3.65 |    3.65 |  15.16
 3008 |  445 |  6.39 |    6.39 |  21.55
 3010 |   28 |  0.40 |    0.40 |  21.95
 3011 |  138 |  1.98 |    1.98 |  23.93
 3012 |  581 |  8.34 |    8.34 |  32.28
 3013 |  176 |  2.53 |    2.53 |  34.80
 3014 |  366 |  5.25 |    5.25 |  40.06
 3018 |  590 |  8.47 |    8.47 |  48.53
 3027 |  720 | 10.34 |   10.34 |  58.87
 3073 |  509 |  7.31 |    7.31 |  66.17
 3074 |  389 |  5.59 |    5.59 |  71.76
 3084 |  526 |  7.55 |    7.55 |  79.31
 3095 |  152 |  2.18 |    2.18 |  81.49
 3097 |  182 |  2.61 |    2.61 |  84.11
 3098 | 1107 | 15.89 |   15.89 | 100.00
 <NA> |    0 |  0.00 |    <NA> |   <NA>

Points of grid were defined using lower left corner coordinates. They were also shifted by 50m east and north to better align with grids.

grid_sf <- grid %>% 
  st_as_sf(coords = c("ll.x", "ll.y"), 
           crs = 4326,
           remove = TRUE) %>% 
  st_transform(21781) %>% 
  mutate(x = st_coordinates(.)[, 1],
         y = st_coordinates(.)[, 2]) %>% 
  select(-ur.x, -ur.y)

# shifting by 50m to the centre
grid_sf_50 <- grid_sf %>% 
  st_drop_geometry() %>% 
  mutate(x = as.integer(as.integer(x) + 51), # why on earth 1?
         y = as.integer(as.integer(y) + 50)) %>% 
  st_as_sf(coords = c("x", "y"), 
           crs = 21781,
           remove = FALSE) 

Grid derived with swisscom offset

swisscom points were linked to country grid derived in file 01.Rmd providing access to crucial tile ID variable needed to link to the Heatmap API outputs.

bern_plz <- 
  read_rds("data/grid/country.Rds") %>% 
  st_join(grid_sf_50,
          left = FALSE)

write_rds(bern_plz, "data/grid/bern_plz.Rds")

Study area coverage

Duplicate cells

There are some cells in the grid that are duplicated because they overlap two (or more?) PLZs and were returned twice.

x <lgl> 
# total N=6965 valid N=6965 mean=0.11 sd=0.31

Value |    N | Raw % | Valid % | Cum. %
---------------------------------------
FALSE | 6232 | 89.48 |   89.48 |  89.48
TRUE  |  733 | 10.52 |   10.52 | 100.00
<NA>  |    0 |  0.00 |    <NA> |   <NA>

Example:

The do have unique ID so can easily be excluded in order to create correct visualizations (see #8). However analyses that would be based on PLZs, particularly aggregation of data would have to determine correct assignment of grid cells to PLZs. Perhaps by using (pop weighted?) centroid or sth similar?